Internet Info 1994 March

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1994 March / Internet Info CD-ROM (Walnut Creek) (March 1994).iso / networking / terms / kermit / charsets / iso8859.networking < prev next >

Wrap

Text File | 1992-07-11 | 72.6 KB | 1,300 lines

Guidelines to use 8-bit character codes Version 2. July 1992. A. Pirard University of Liege Belgium Important preliminary notice This file contains translation tables between proprietary codes and ISO codes. As indicated, some translate several characters arbitrarily by lack of a known definition of this translation by the owner of the code (constructor). So, watch this space for an update indicating any news as I get to know it. Since version 1: - At the request of SHARE, IBM has: - defined a new code page 1047 compatible with the de-facto EBCDIC. - defined a new code page 819 corresponding to ISO 8859-1. - published a document listing the translation between 819 and the SAA code pages 850 and 500, from which other translation may be deduced. - see summary of changes at the top of the paragraph about IBM. - So, the translation tables between 8859-1 and PC codes have been changed accordingly. - The translation of the Macintosh code has been changed to account for 6 ISO characters that appear in an Islandic Macintosh code and translate arbitrarily otherwise. This pushed away 4 other arbitrary translations. - IBM code pages 850 and 1047 are considered the preferred tables; other translations were moved to a secondary file to reduce size. Changes to the text: - more complete explanation of keyboard handling for the PC. - updating explanations to follow evolution of usage and terminology. - minor revisions for clarity. Introduction. In the course of my work in communications in a French-speaking environment -- writing programs, installing but mostly having to adapt others -- I discovered facts, notions, techniques and data related to international characters usage. Many English-speaking programmers are willing to extend the scope of their software to what is for them "foreign languages". Discussion with them is often lengthy to convey numerous details that are obvious to one and obscure to the other. Trying to help without repeating the same words all over again is the reason of this document. This text is restricted to the problem of the character codes used in data. Yet, I should mention briefly that isolating from executable code the user interface messages is a real plus. These messages should be easily translatable by anyone who knows the language, even if source is unavailable. Anything similar to the Macintosh resources is ideal. To avoid making feel this goal too easy, I must warn than phrases in many languages are longer than English and that the order of inserts may vary depending on grammar. I am much indebted to the people I met on networks and on the mailing list ISO8859@JHUVM for their discussion (especially Edwin Hart HART@APLVM, with his SHARE White Paper to IBM). The international community owes much to the Kermit developpers group led by Christine Gianone and fed by Frank da Cruz and many volunteers who produced several Kermit versions using the principles described in this document and store character codes related data on WATSUN.CC.COLUMBIA.EDU:kermit/charsets. I should also thank many other people for their interest, especially those who adapted their programs, but I am truly unable to mention them all. You will know when some ISO 8859 setting catches your eye. At the risk of a lack of justification, I have made every effort to keep this text as concise as possible to spare your time. One will have to think beyond the text in some places. On the other hand, please excuse if some paragraphs contain evidence: it is sometimes needed. Also remember that English is not my mother language... A language among others: French. Like many other languages, French uses characters not found in English. It likes to adorn them with diacritics (accents). Other languages use other characters, from a few like German to totally different like Russian and Greek, or even the right to left Arabic and Hebrew. To the question: "could you do without them?", I like to reply that forgetting them in "a la francaise" makes it mean "has the French girl". "a" must take a grave accent to distinguish the preposition from the verb and "c" takes a cedilla. French without diacritics is certainly not unreadable, rarely ambiguous with the aid of context (i. e. to humans but not to computers), but just as unpleasant as all-uppercase text and difficult to read, stumbling on most missing accents, like proof-reading one's kid dictation. In the general case, many languages cannot do without their own characters anyway. Terms. A "character" is what one writes down on paper. A "code" is a computer representation of a set of characters that we can see as associated to numbers called "code points". A code usually includes "control characters" for which a graphic representation does not normally exist, because they are only used to control the operation of hardware or have special meanings to programs. 7-bit character codes ASCII (ANSI X3.4) was defined as a 7-bit code for English at a time when hardware was really hard and expensive. To allow the use of some of those particular characters that other languages need, it was later decided that a defined subset (the least used ones) could be replaced. This is ISO 646. Several language had the subset replaced with their own characters. This is what can be done with Escape sequence of Epson printers to switch to a national language. ANSI X3.4 became an instance of ISO 646. But, for some languages like French, the amount of characters that can be replaced is not enough and text processing of these days made extensive use of backspaces and overstrikes for the missing ones. On the other hand, replacing programming symbols with national characters introduces much confusion in programming languages, like a comment being terminated by its own text, and in several uses of those characters (e.g. in e-mail or Unix) where the national meaning clashes with the ASCII one. US EBCDIC (an IBM code) used more or less the same characters as ASCII, but used different code points. I should say "more and less". Some ASCII characters did not exist in EBCDIC (e. g. square brackets) and EBCDIC had ones (cent sign, not sign) that were not in ASCII. As a consequence, the translation between ASCII and EBCDIC was strictly speaking undefined, and IBM never officially defined a complete one. Users defined one translation which resulted in a so-called de-facto EBCDIC containing all the characters of ASCII, that all ASCII-related programs use. Albeit EBCDIC was an 8-bit code "with holes", IBM made the same characters replacements as ISO 646 in hardware to be used with other languages (but, again, as other characters were missing, this was of little use to French). Even though data was stored in octets, 7-bit communication line were used and it was (and still is) common practice for software to strip off the 8th bit despite a possible extension of the code, future or existing. We lived a long time of computer frustration. Is the problem solved? 8-bit character codes Storing in a database text full of "this backspace that", trying to sort it etc... or getting a Sterling pound bill paid in dollars because that's what the dollar sign is replaced with in the English version of ISO 646 was a real pain and an insult to the octet. It was soon realized that, even if text processing could cope to some extent with compound characters, data processing could not at all. One character must be one data element of constant width. With the era of cheaper hardware and microcomputers, manufacturers started to use the upper half of the 256 code points of the common 8-bit byte for international characters. It was one major reason of the success of these computers over the international place. But there was no standard and each did it his own way as to which characters and which code points to use, like to-day's DEC, Apple, Atari, Commodore or other less known brands. The IBM PC was built with yet another code that was later called "code page 437" and that everyone in the compatible business settled on. But IBM also built PCs with variations for countries using characters that were not in 437, now called 860, 863 and 865. There was an evident Babel and a new standard had to be set. National institutions and many constructors participated to produce the ISO 8859 standard. As 256 code points are not enough for all languages in the world, several "versions" of this standard exist (see below for a list, still evolving). ISO 8859-1 is for group 1 of Latin-based languages and covers Western Europe, including English, hence many major countries in North and South America, Australia and many others world wide. A new multibyte standard is being prepared: ISO 10646 -- in which ISO 8859-1 is a contiguous subset --, that will cover all languages in a single code. "Unicode" -- a code being defined by a consortium of manufacturers -- and ISO 10646 joined: Unicode will be a 2-byte subset of 4-byte ISO 10646, with the remarkable result of a single worldwide code. Until ISO 10646 can be used, today's hardware and software, strongly single-byte oriented, can easily extend the scope of a character code to 8 bits and one version of ISO 8859. The particular version used being implicit to a group of languages is sorry indeed, but it must be understood that it is a dramatic improvement in a country or a group of countries where data is implicit anyway. For short, I may call "ISO 8859" or simply "ISO" in the following text any version that a system uses at any one time, when assuming that the systems do not switch versions dynamically, but that the user can setup the choice of the version he uses, if not implied by hardware. ISO 8859 (any version) is an extension of ASCII. The upper half (in fact, 128-159 are reserved for more control characters) is filled with characters for a group of countries. The present trend to use ISO 8859 is certain. Version 1 is much like the previous DEC's "8-bit ASCII code", and VT terminals now have a setup to use 8 bits and ISO 8859 (and Escape sequences to switch among and display several ISO 8859 versions). Looking at Microsoft and Lotus international codes, one notices that they had soon adopted a "pre-release" of ISO 8859-1 (Microsoft calls ISO 8859-1 "ANSI code" in their documentation of Windows). As explained below, IBM have adopted ISO 8859-1 their own way. X-Windows specifications (from MIT, of a presentation system on a remote graphic terminal) prescribe that ISO 8859-1 is to be used on the communication line. By mutual agreement, a growing number of universities and institutions exchange data in ISO. ISO 8859-1, Latin Alphabet 1, for Dutch, English, Faeroese, Finnish, French, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, and Swedish. ISO 8859-2, Latin Alphabet 2. Albanian, Czech, English, German, Hungarian, Polish, Romanian, Serbocroation, Slovak, and Slovene. ISO 8859-3, Latin Alphabet 3, for Afrikaans, Catalan, English, Esperanto, French, Galician, German, Italian, Maltese, and Turkish. ISO 8859-4, Latin Alphabet 4, for Danish, English, Estonian, Finnish, German, Greenlandic, Lappish, Latvian, Lithuanian, Norwegian, and Swedish. ISO 8859-5, the Latin/Cyrillic Alphabet, for Bulgarian, Byelorussian, Macedonian, Russian, Serbocroation, and Ukrainian. ISO 8859-6, the Latin/Arabic Alphabet. ISO 8859-7, the Latin/Greek Alphabet. ISO 8859-8, the Latin/Hebrew Alphabet. ISO 8859-9, Latin Alphabet 5, for Danish, Dutch, English, Faeroese, Finnish, French, German, Irish, Italian, Norwegian, Portuguese, Spanish, Swedish, and Turkish. The "foreign" environment. So, these facts of languages have our typewriters different, and the computer keyboards are modelled after them. A few letters moved about, digits on the uppercase side, accented letters in place of programming symbols etc... More striking, if you pardon the pun, is that -- because the amount of keys is not enough for all the French characters -- some so-called dead-keys are used to compose accented letters by a strike of them followed by another letter, giving a single code point as program input, just like a typewriter could overtype. It must be realized that, to an international computer user, an 8- bit code is just as natural as the 7-bit one of English-speaking users. 8-bit code points "come out" some plain keys of the keyboard and are expected to display. If a program filters them out, this will be shocking. If it uses these code points for internal control functions, the user will be confused with "strange behavior" a US keyboard would never exhibit. For example, if it strips the 8th bit of a PC e-acute, it produces a disturbing linefeed. Or if a program decides that normal characters belong to the range 32-127, this will play havoc. It is worth checking a program with such data, that some keyboards can produce with alternate input. Trust little about the keyboard layout and physical scan-codes. The only reliable input is through the operating system or country- configurable keyboard driver interface. Working with physical input is trying to duplicate the varying and sometimes complicated logic of those drivers (maybe covering several keyboards per country) and heading for problems or incomplete coverage. Assuming that one can use transformation of one strike to one code point is incorrect, because of the dead-keys. Using the state of special keys of the PC (Shift, Ctrl, Alt etc...) to try to modify the meaning of what the system outputs (a usual feature of communication programs) is not the best idea either, because keyboard recorders rarely replay the shift states along with that output. And, in general, mixing input from different levels is unsafe: strictly speaking, these states are asynchronous with the input, one may read a key code when the shift state has disappeared. Yes, a program is usually faster than the user, but can one swear that a fast, long buffered auto repeat makes this true in all cases? Imagine your output being blocked by network flow control... Oh yes, it can happen. As an example, here is what can be done on that PC I know well. The keyboard driver outputs 2 bytes, H and L. When H is nonzero, it is the physical position of the key pressed; so, unless the documentation really wishes to refer to the key by position and not keycap for such things as diamond-shaped or in-a-row key groups, ignore this value and simply use L as final data: it is extended ASCII to be used as such (or, at most to go through a code translation as discussed hereafter). Note that different keys (different H) may produce the same code-point (L); e.g. L is 0 for an Alt/literal-number of the PC. When H is zero, a special key combination has been pressed, indicated by the value of L to be used to index a table of actions of the program. The PC defines 166 such special key combinations (0+L) and the intention of the application designer -- when using modifying shift states -- is to provide more, or, also, those the user really wants. The 90 values of L are probably enough additional definitions (but some or all of the 166 could even be redefined or even "impossible" H combined with a 256 L multiplicator). Hence, the simplest method is to assign pre-defined additional pseudo-scan-codes (0+L) -- and, repeat, certainly not extended ASCII code points -- to the actions of the program and to manage to have the keyboard driver produce them on any key the user chooses. Here is how to do that. Each time a key is pressed (or released), the keyboard driver -- be it in ROM or the keyb... driver -- calls software interrupt 15h with 4Fh in register AH, with carry flag set and the physical scan code of the key in register AL (ored with 80h when released, so that this case can easily be ignored). The application may intercept interrupt 15h, test for a key it wants together with the shift keys states it wants (safe at this level). - If AH is not 4Fh or AL is an unwanted key, the processor's flags and registers are left as on program entry (with carry set), and control is transferred to the next interrupt 15 handler (as any well-behaved interceptor must do); eventually, the keystroke will be used or ignored by someone else. Usually, this transfers to a dummy interrupt and returns to the keyboard driver to use the keystroke in the normal way. - If the key is wanted, interrupt 16 is called with AH=5 and CH+CL set to what is to be placed in the keyboard buffer queue to be the input to the application. Then, return is made to the caller with carry flag cleared to indicate to the keyboard driver that the keystroke is used and that it is to ignore it and clean up the hardware interrupt. - One can insert anything in the keyboard buffer, extended ASCII (PC code) or pseudo-scan-code: a keyboard recorder will receive that and replay it faithfully (but, of course, your inventions will be meaningful only to your own application). This is the way to even produce "Enter" with the right Ctrl key as IBM 3270 emulations do (if you really insist, I personally hate this). However, remember that inserting extended ASCII may be in conflict with the choice of a particular keyboard or code page for which it is different: again, the keyboard driver knows much better about that. I am no specialist of the Macintosh internals, but I guess there's a similar story to tell for it. 8-bit codes in communications. We now realize that exchanging data between those computers with proprietary 8-bit codes is to international users exactly like sending data from an ASCII machine to an EBCDIC one: translation has to occur somewhere. Which is to translate what to what? Communication, if to work at all, relies heavily on strict standards. If communication between EBCDIC and ASCII computers is feasible, it is because of the well known fact -- so well one often forgets to state it -- that character on a communication line must be ASCII. Just imagine there would be nothing such. Just realize that there is no clearly spoken equivalent for international characters, just tacit agreement. It is urgently needed to stop any sorts of hacking. I know of at least 25 different codes with characters similar to ISO 8859-1 that a file receiver would have to try to detect and know if there were no rule. This makes over 1000 translation tables. This text advocates standard communication and simplicity with one code on a given computer. The only solution is to state that each and every octet of text data carried on a communication line cannot be anything else that an official standard and that, while waiting for a single multi-octet standard, each language uses only one standard. ISO 8859 fills this purpose and is the only official standard. It is already used by major firms and some protocols like X-Windows. Conclusions. A) An "8-bit clean" computer is one allowing characters to have the 8th bit set. If such a computer (like more and more Unixes of these days) is to choose a code, the obvious, painless one to avoid any translation is the standard: a version of ISO 8859. Note that such a machine becomes code-dependent only by 1) the system messages in the user's language and 2) the terminals and other peripherals used to display and enter the data (hence, other messages). It might seem that owing a uniform environment of PCs or Macs and their printers could make their code the best choice for a near Unix machine. On the long range, this will cause problems when that environment will be integrated in networking with other sites. And internetworking is moving fast and spreads standards. Better start right than have a computerfull of data to translate one day. By the way, note that most terminal emulators already use ISO 8859-1. B) If a computer is forced to continue using a code different from but with a character set similar to a version of ISO 8859, it must behave with regard of what it sends on and receives from communication lines as if it were using that version of ISO. This means that the key feature of protocols (like file transfer in text mode or electronic mail) is to implement translation of the data that this protocol exchanges with the communication line. This applies to both services provided by a host and terminal (client) functions provided by stations. In normal usage, this translation is expected to always be to ISO 8859, but, to ease the transition period, the translation may be selectable, especially to revert to the compatible case of null translation. However, the user should be advised that the preferred translation is to ISO (and that it in no way impairs communication restricted to ASCII). In such a case, a requirement is to define a "best fit" translation between the proprietary code and that ISO version for text file transfer. Characters identical in both sets produce a meaningful code point translation; the translation of other characters is arbitrary but must be well defined. The important point is that this translation must be one to one and invertible for all the 256 characters (that is, each character translates to a different one and the reverse translation returns the original value). The translation of the lower half of an extension of ASCII is null. This kind of translation is valuable even if translating characters to totally different ones in operations like file transfer, instead of trying to obtain look-alike or multiple ones. The reason is that doing otherwise may permanently corrupt data that cannot be fully processed later, be it just to return or forward it. It is better to obtain partially meaningless data (in appearence) and to be able to process it locally (e.g. print it more meaningfully) than to assume that the goal of network transfer is final usage. Note that if a system does not use a subset of the code points, it may have to receive files from systems that do. A main difficulty is that this translation should be unique for a given system, so that two computers running this system be able to exchange data of their own code under the above rules (translation to ISO) without data loss. It is clear that a proprietary communication protocol (like NETBIOS) can use the proprietary code without translation. (Yet, one day, that protocol (like NETBIOS) may well extend to other computers, causing difficulty.) But, in internetworking, and especially with electronic mail, it should not be expected from a computer to necessarily know the type of machine (hence code) of the other party. The constructor (the owner of the proprietary code) should define this translation precisely but sometimes fails to do so. In consequence, one goal of this document is to suggest one as widely as possible. Terminal emulation deserves a special discussion. For communication programs (usually providing VT100 terminal emulation), it is not necessary to provide the full features of the higher VT models that can switch character codes to achieve international characters support. Moreover, it is not desirable to ask that the hosts a terminal is connected to have to send character codes switching escape sequences in order to initiate the use of national characters. What is needed is just to be able to setup terminal mode with an initial state of what display the GR code points (values above 127). This way, using ISO 8859 will only be a "matter of fact" to the 8-bit-clean host and neither has to know about code switching. This is especially true when the only possible display a microcomputer can achieve is by translating ISO from the line to its own similar character set, like the IBM PC or an Apple Macintosh with standard fonts. In short, VT100 emulation is sufficient, but with added translation before display and from the keyboard. Now, one important remark about implementing translation with a proprietary code in a communication program. Two methods are possible. A) Text is translated at the communication line interface. Hence, the proprietary code is used for text in computer memory. B) Text is translated at the other system interface (screen, keyboard, file). Hence, ISO 8859 is used for text in memory. The choice of the method depends on a number of factors. - If the communication protocol is such that all data on the line is text, method A is the easiest. If there is a mix of text and binary and an minimum of interface points where text can be translated is not found, then method B should be considered. - If the system interfaces can be well localized (e. g. routines in the program to interface the screen, keyboard and files of the PC), method B is easy. Else (e. g. the Macintosh where multiple system interface exist with text as parameters) method A may be better (unless maybe, on the Mac, ISO fonts were used just for this reason, not very practical except for a terminal emulation program). - If the proprietary code is not unique (like multiple in use on the PC), method B is best unless an interface is built to translate the internal program messages to the current code. - Using ISO in memory makes the program messages more portable. Two typical examples: a terminal emulation with file transfer (Kermit style) on a PC used method B with advantages; a file transfer program (TCP/IP FTP) on a Mac used method A with great simplicity (e. g. the filenames in the FTP dialog were translated altogether when method B would have required to act at various points of the Macintosh API). Moral. I can hear those having read this far say they did not suspect such problems. You will now understand why it is important to write 8-bit clean software, to use a single code on one computer, that by far the most interesting to-day is ISO 8859 (the Unix advice) and why applications running on inconvertible systems have to translate text. IBM and ISO 8859-1 (general, see details before the IBM tables) For the PC, IBM has now adopted the character set of ISO 8859-1 with a different code. This was done by replacing some characters of the original PC code, now called code page 437, to obtain the full character set of ISO 8859-1. This new code is called "code page 850" and IBM sees it as the preferred code page for all Latin1 customers (it's their default code for OS/2). See the appendix D of the "DOS reference manual" for a description of 850 and the code pages it may replace: 437, 860, 863 and 865. Beware, the yen, cent, and two paragraph symbols that existed in 437 were moved in 850. When one builds a translation table between 850 and ISO 8859-1, 32 characters of 850, mainly box-drawing, are left to be assigned to the 32 control characters 80-9F of ISO. For the EBCDIC mainframes, IBM decided that, because terminals were already using the ISO-646-like replacements to the US EBCDIC, they had to stay compatible. They extended each such "national EBCDIC" to "country extended code pages". Thus, there are as many EBCDICs as versions of ISO 646 (what ISO 8859 is trying to avoid). None of the original CECPs was compatible with the de-facto EBCDIC. Lately, IBM defined CECP 1047 which is compatible with (an extension of) the de-facto US EBCDIC (see discussion below). In consequence, I consider that CECP 1047 is the most interesting EBCDIC code to use, because of the compatibility with the vast software base. CECP 1047 "internationalized industry standard" (my terms) CECP 037 for US, Canada-French, Netherlands, Portugal. CECP 273 for Germany. CECP 277 for Denmark and Norway. CECP 278 for Finland and Sweden. CECP 280 for Italy. CECP 284 for Latin America and Spain. CECP 285 for United Kingdom. CECP 297 for France. CECP 500 for Belgium, Switzerland-French and Switzerland-German. Like 850, all these codes contain all the characters of ISO 8859-1. Only the recent CECP 1047 is compatible with a de-facto standard EBCDIC, corresponding to a de-facto ASCII/EBCDIC translation, that a huge amount of products settled on long ago, including software from IBM: - all compilers from IBM or others: C, REXX, PL/I, Pascal, for those sensitive to the differences in code points, - File transfer programs like Kermit, PCTERM, and IBM TCP/IP, - In fact, the whole of IBM TCP/IP, - Terminal emulation: TTY line mode or 3270 emulation by the 7171, - ASCII tapes translation, - Products to translate ASCII to EBCDIC on a mainframe: ARCUTIL ... - Products that should produce ASCII, but produce EBCDIC because data goes through EBCDIC/ASCII translation: e. g. SAS output for Tektronix, - Products that convert this output anyway, because the expected EBCDIC/ASCII translation does not occur: LINEMODE through the 7171 transparent mode, - Similarly, TPRINT to print in this transparent mode - Certainly many other products I don't know of or I forget, because, as you see, the de-facto EBCDIC snowballs from one use to the other, - Last but far from least, it's the translation made by most gateways that relay mail between BITNET and the Internet, i.e. between EBCDIC mail and ASCII mail. Of special importance is that of the encoding of data that is to be transmitted by e-mail (UUENCODE, BOO, HQX...): if the ASCII-EBCDIC-ASCII translation fails to be invertible, decoding fails. The requirement #1 of SHARE is that IBM use a single EBCDIC code for Latin group 1 and publish it. Using an extension of de-facto EBCDIC is recommended. Asynchronous communication Thanks to the interest of Frank da Cruz and Christine Gianone, Kermit now defines specifications to support ISO 8859 (and other codes if needed) on the communication line in terminal and file transfer mode. It has provision to extend to mixed codes files too. John Chandler has extended the traditional translation made by his remarkable IBM mainframe Kermits to the specific choice of any CECP or the extended de-facto EBCDIC to ISO 8859-1. The impressive MSDOS Kermit by Joe Doupnik now also supports translation of PC code pages to ISO8859-1. Thanks to Paul Placeway, Macintosh Kermit now supports ISO 8859-1 as an 8-bit line terminal. Others have taken over the job to complete it. I think I can speak on behalf on the international computing community and enthusiastically thank these people for a work most useful to them. TCP/IP Despite a mention I have read in an introduction to the TCP/IP communication protocols "provision for hosts with different character sets", the idea does not extend much into the standards. In fact, some of them even restrict text to 7-bit explicitly and without more reason that some points of forgotten history. No attempt is made to make a statement to standardize what must be an 8-bit code so that it be common to all machines, just like ASCII is, as explained above. In practice, it is often no more than a question of implementation: use ISO 8859 as the code of a machine or translate the proprietary code to ISO 8859. At the time of writing the first version of this text, just EBCDIC mainframes did translate, because the need appeared evident; it was restricted to the US ASCII character set, but a simple table change extends the scope of all protocols. For international characters users, the same problem and solution exists for any host not using ISO 8859. As of this writing, the most important applications on the Macintosh have applied the principles: Eudora (POP3) by Steve Dorner, Brown tn3270 by Peter DiCamillo, Fetch (FTP client) by Jim Matthews, FTPd (FTP server) and other programs by Peter Lewis which cope with translation, exactly to-day NCSA/BYU/UCL Telnet by Pascal Maes, of course Mac-X from Apple and even others still to check. IBM PC, statu quo: just Telnet by IBM (both vt100 and tn270) and several other firms. Thanks to the authors! The idea to translate the data does not come to the mind of the persons who write the TCP/IP applications because they don't know the problem. If the protocol speaks about it, the application will probably be written correctly for that matter. For example, the specifications of X-Windows state that ISO 8859-1 is the code that must be used to exchange text between the client and the server of that protocol; and all X- Windows applications are correct. So, failing to rewrite most RFCs just for this, what is needed is a general TCP/IP statement saying what single code TCP/IP application protocols use on communication lines: ISO 8859 with future migration to ISO 10646. This would be like adding a minimal presentation layer. Specific TCP/IP cases. Telnet. Take the most basic VT100 implementation, treat the keyboard as explained above (translating keyboard input to ISO 8859), translate ISO to local code before display and you've done it. No need to try to negotiate binary (I am told it even hurts and binary has nothing to do in my mind with the fact that the text a particular terminal uses is 8-bit). Note that anyone afraid of the 8th bit can limit his typing to ASCII; his host will not return him anything else and the upgraded program will behave exactly like before. Also note that ISO 8859 does not conflict with the 8-bit control characters and that using ISO is simplification. No need to wonder or negotiate if the host will send them: if any byte in the range 80-9F comes in, you may treat it as control. Tn3270. Like IBM mainframes, it is forced to translate. So, it's just a matter of using the correct tables. It will save your time not to try to support all the EBCDIC CECPs. Using CECP 1047 will probably make everybody happy. However, make the translation customizable. If someone wants things differently, it will probably be a whole installation with time to customize it. SMTP. Despite RFC 821 restricts data to 7-bits, it works quite well with 8. We use 8-bit mail on Unix (Sun and IBM), on IBM mainframes and on Macintosh to the delight of our users. It's just a matter of not crossing 8th-bit-stripper gateways. For the Internet, we do not use such hosts as less preferred MXes and we expect that sites wanting 8 bits will do so. Together with many other sites, we use ISO 8859-1. No problem! So, that's just what it is needed for the Internet: kill 8th bit killers or don't use them. Other networks should be expected to do so with their mail and use the correct gateways with the Internet. The BITNET/Internet gateways, for example, should translate between ISO 8859-1 and CECP 1047. The same general rules for translation as explained above for file transfer apply to FTP and other protocols. Note that text vs binary is a distinction to introduce in additional places, maybe. For example, NFS would benefit from using it (and best at the file level). General conclusions 1) Every effort should be made so that all operating systems' codes be unique and universal, i. e. ISO 8859-x for an 8-bit code, while waiting for the perfect unity of a single multibyte code. 2) Failing that, communication software must palliate a particular system weakness and translate data so that it appears to the outside world to use the unique data interchange code. 3) Programers must deal with 8-bit character codes (and prepare for multibytes ones). Translation. I have been looking for constructor-defined or most widely accepted complete tables and I explain the reasons of the choices. However, I cannot guarantee that another translation will not be used someday. The data correspond to my explanations. That's all I can say. DEC Easy case first. DEC uses ISO 8859-1 (just a few characters of their 8-bit code -- pre-dating ISO 8859 -- are different). Nothing to do except making sure the 8 bits go through. IBM translations Since version 1 of this document, IBM has published the following "Character Data Representation Architecture" (CDRA) documents: GC09-1392-00 Executive Overview GC09-1390-00 Level 1, Reference GC09-1391-00 Level 1, Registry The latter answers most of the former questions about translation. IBM has also published a new EBCDIC CECP 1047 that fulfills the requirements of compatibility with the previous de-facto EBCDIC. However, IBM has made no statement I know about support nor whether this code is intended to be the sole one for Latin-1 languages. In consequence of the SHARE requirement (the necessity to use a single compatible code on IBM mainframes), I think with many people that only CECP 1047 should be used on EBCDIC mainframes. And, by extension, only CP 850 on the PC (but ISO 8859-1 would be better). The PC may also use CP 437 (e.g. when 850 is not available) as limited use of a subset of the ISO character set. But, even if using CP 437, a PC should use the same translation to ISO as for CP 850. Only 4 characters need to translate differently and those needing them are expected to use CP 850. The translation tables listed below are limited to these two codes (others are found in a separate file). A problem exists with the translation of CECP 850 with ISO. As published in the CDRA registry, the translation of the ASCII part is not a null translation. This has simply been corrected below. But the IBM translation also does not implement round trip integrity with PC to EBCDIC translation published and used by IBM products (specifically, 850- >500 is not 850->ISO->500). So, this table may be subject to change. Unless IBM decide that the wrong table is CECP 1047 with ISO. Unless they say nothing and don't mind that they have set their Communication Manager wrong. The change would only affect the range 80-AF of the ISO control characters, though. ISO 8859-1 to CECP 1047 (Extended de-facto EBCDIC): 00 01 02 03 37 2D 2E 2F 16 05 25 0B 0C 0D 0E 0F 10 11 12 13 3C 3D 32 26 18 19 3F 27 1C 1D 1E 1F 40 5A 7F 7B 5B 6C 50 7D 4D 5D 5C 4E 6B 60 4B 61 F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 7A 5E 4C 7E 6E 6F 7C C1 C2 C3 C4 C5 C6 C7 C8 C9 D1 D2 D3 D4 D5 D6 D7 D8 D9 E2 E3 E4 E5 E6 E7 E8 E9 AD E0 BD 5F 6D 79 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96 97 98 99 A2 A3 A4 A5 A6 A7 A8 A9 C0 4F D0 A1 07 20 21 22 23 24 15 06 17 28 29 2A 2B 2C 09 0A 1B 30 31 1A 33 34 35 36 08 38 39 3A 3B 04 14 3E FF 41 AA 4A B1 9F B2 6A B5 BB B4 9A 8A B0 CA AF BC 90 8F EA FA BE A0 B6 B3 9D DA 9B 8B B7 B8 B9 AB 64 65 62 66 63 67 9E 68 74 71 72 73 78 75 76 77 AC 69 ED EE EB EF EC BF 80 FD FE FB FC BA AE 59 44 45 42 46 43 47 9C 48 54 51 52 53 58 55 56 57 8C 49 CD CE CB CF CC E1 70 DD DE DB DC 8D 8E DF inverted, CECP 1047 (Extended de-facto EBCDIC) to ISO 8859-1: 00 01 02 03 9C 09 86 7F 97 8D 8E 0B 0C 0D 0E 0F 10 11 12 13 9D 85 08 87 18 19 92 8F 1C 1D 1E 1F 80 81 82 83 84 0A 17 1B 88 89 8A 8B 8C 05 06 07 90 91 16 93 94 95 96 04 98 99 9A 9B 14 15 9E 1A 20 A0 E2 E4 E0 E1 E3 E5 E7 F1 A2 2E 3C 28 2B 7C 26 E9 EA EB E8 ED EE EF EC DF 21 24 2A 29 3B 5E 2D 2F C2 C4 C0 C1 C3 C5 C7 D1 A6 2C 25 5F 3E 3F F8 C9 CA CB C8 CD CE CF CC 60 3A 23 40 27 3D 22 D8 61 62 63 64 65 66 67 68 69 AB BB F0 FD FE B1 B0 6A 6B 6C 6D 6E 6F 70 71 72 AA BA E6 B8 C6 A4 B5 7E 73 74 75 76 77 78 79 7A A1 BF D0 5B DE AE AC A3 A5 B7 A9 A7 B6 BC BD BE DD A8 AF 5D B4 D7 7B 41 42 43 44 45 46 47 48 49 AD F4 F6 F2 F3 F5 7D 4A 4B 4C 4D 4E 4F 50 51 52 B9 FB FC F9 FA FF 5C F7 53 54 55 56 57 58 59 5A B2 D4 D6 D2 D3 D5 30 31 32 33 34 35 36 37 38 39 B3 DB DC D9 DA 9F ISO 8859-1 to IBM PC code page 850: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F BA CD C9 BB C8 BC CC B9 CB CA CE DF DC DB FE F2 B3 C4 DA BF C0 D9 C3 B4 C2 C1 C5 B0 B1 B2 D5 9F FF AD BD 9C CF BE DD F5 F9 B8 A6 AE AA F0 A9 EE F8 F1 FD FC EF E6 F4 FA F7 FB A7 AF AC AB F3 A8 B7 B5 B6 C7 8E 8F 92 80 D4 90 D2 D3 DE D6 D7 D8 D1 A5 E3 E0 E2 E5 99 9E 9D EB E9 EA 9A ED E8 E1 85 A0 83 C6 84 86 91 87 8A 82 88 89 8D A1 8C 8B D0 A4 95 A2 93 E4 94 F6 9B 97 A3 96 81 EC E7 98 inverted, IBM PC code page 850 to ISO 8859-1: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F C7 FC E9 E2 E4 E0 E5 E7 EA EB E8 EF EE EC C4 C5 C9 E6 C6 F4 F6 F2 FB F9 FF D6 DC F8 A3 D8 D7 9F E1 ED F3 FA F1 D1 AA BA BF AE AC BD BC A1 AB BB 9B 9C 9D 90 97 C1 C2 C0 A9 87 80 83 85 A2 A5 93 94 99 98 96 91 9A E3 C3 84 82 89 88 86 81 8A A4 F0 D0 CA CB C8 9E CD CE CF 95 92 8D 8C A6 CC 8B D3 DF D4 D2 F5 D5 B5 FE DE DA DB D9 FD DD AF B4 AD B1 8F BE B6 A7 F7 B8 B0 A8 B7 B9 B3 B2 8E A0 Apple Macintosh Apple Inc. remained silent to the request for an official translation table between ISO 8859-1 and the Macintosh code that would fulfill the data processing requirement of being invertible for the 256 code points. So, I built one and suggested that the Kermit repository store the data and be the reference for it. I made the translation as compatible as possible with an existing translation tables, the official "Apple File Exchange" from Apple Inc. that translates between IBM PC code and Apple's, hence, indirectly to ISO 8859-1. Many characters of the Apple fonts belong to ISO 8859-1 and caused no problem. The translation of some characters became incompatible, because the "Apple File Exchange" is homographic, which fails to be invertible (e. g. 2 superscript translates to plain 2), and because the AFE is based on IBM PC 437 that contains some characters of the Macintosh set that have been replaced (giving IBM PC code page 850) with characters of ISO 8859-1 (for example, it matched Mac Omega to a 437 Omega that became a 850 U circumflex that now has to match the Mac's F3.) Several translations that remained arbitrary were preferred to be homographic or mnemonic. Leftovers from the 80-FF Mac range have simply be lined up in the 80-9F range of ISO 8859-1 without any particular reason. This is a second version of the translation; 6 characters of the standard Apple code whose translation was arbitrary have been translated according to their Islandic replacements (plus change of the translation of the Apple code points to which these ISO characters translated). Below, you will find comments about the choices (why): Blank: compatible with AFE (same in both PC 437 and 850). S: not in 437/AFE, but ISO character is in "Standard Apple Character Set" E: same for "SACS with extensions" (on newer systems only). I: translation according to an Islandic Apple font. A: arbitrary (but choice sometimes guided by lookalike or mnemonic aspects and a few characters of PC 437 will be preserved). ISO Mac ISO 8859-1 name (IBM) Why Mac name (Paul Placeway) 80 | A5 | | A | bullet 81 | AA | | A | trade mark 82 | AD | | A | not equal 83 | B0 | | A | infinity 84 | B3 | | A | greater than or equal to 85 | B7 | | A | Uppercase Sigma (Summation) 86 | BA | | A | integral 87 | BD | | A | Uppercase Omega 88 | C3 | | A | radical (square root) 89 | C5 | | A | approx equal 8A | C9 | | A | elipsis (...) 8B | D1 | | A | em dash 8C | D4 | | A | left singlequote ( ` ) 8D | D9 | | A | Y dieresis 8E | DA | | A | divide (a / with less slope) 8F | B6 | | A | partial 90 | C6 | | A | Uppercase Delta 91 | CE | | A | OE 92 | E2 | | A | baseline single close quote 93 | E3 | | A | baseline double close quote 94 | E4 | | A | per thousand 95 | F0 | | A | (closed) Apple 96 | F6 | | A | circumflex 97 | F7 | | A | tilde 98 | F9 | | A | breve 99 | FA | | A | dot accent 9A | FB | | A | ring accent 9B | FD | | A | Hungarian umlaut 9C | FE | | A | ogonek 9D | FF | | A | caron 9E | F5 | | A | dotless i 9F | C4 | | A | florin A0 | CA | required space | A | non-printing space A1 | C1 | exclamation point inv | | inverted ! A2 | A2 | cent sign | S | cent A3 | A3 | pound sign | | sterling A4 | DB | int. currency symbol | E | generic curency A5 | B4 | Yen sign | S | yen A6 | CF | Vertical Line, Broken | A | oe A7 | A4 | section/paragraph symb| S | section A8 | AC | diaeresis,umlaut acc | S | dieresis (AKA umlaut) A9 | A9 | Copyright sign | | copyright ( (C) ) AA | BB | ordinal indicator fem | | feminine ordinal AB | C7 | left angle quotes | | left guillemot (like << ) AC | C2 | logical NOT, EOL symb | | logical not AD | D0 | Syllabe Hyphen | A | en dash AE | A8 | Regist.Trade Mark sym | S | registered ( (R) ) AF | F8 | overline | A | macron B0 | A1 | Degree Symbol | | superscript ring B1 | B1 | plus or minus sign | | plus minus B2 | D3 | 2 superscript | A | right doublequote ( '' ) B3 | D2 | 3 superscript | A | left doublequote ( `` ) B4 | AB | acute accent | S | acute accent B5 | B5 | micro symbol | | greek lowercase mu B6 | A6 | paragraph symbol USA | S | paragraph B7 | E1 | Middle dot accent | E | centered (small) dot B8 | FC | cedilla accent | E | cedilla B9 | D5 | 1 superscript | A | right singlequote ( ' ) BA | BC | ordinal indicator mas | | masculine ordinal BB | C8 | right angle quotes | | right guillemot (like >> ) BC | B9 | one quarter | A | lowercase pi BD | B8 | one half | A | Uppercase Pi (Power) BE | B2 | three quarters | A | less than or equal to BF | C0 | Question mark inverted| | inverted ? C0 | CB | A grave capital | S | A grave C1 | E7 | A acute capital | E | A accute C2 | E5 | A circumflex capital | E | A circumflex C3 | CC | A tilde capital | S | A tilde C4 | 80 | A diaeresis capital | | A dieresis C5 | 81 | A overcircle capital | | A ring C6 | AE | AE diphthong capital | | AE C7 | 82 | C cedilla capital | | C cedilla C8 | E9 | E grave capital | E | E grave C9 | 83 | E acute capital | | E accute CA | E6 | E circumflex capital | S | E circumflex CB | E8 | E diaeresis capital | E | E dieresis CC | ED | I grave capital | E | I grave CD | EA | I acute capital | E | I accute CE | EB | I circumflex capital | E | I circumflex CF | EC | I diaeresis capital | E | I dieresis D0 | DC | Eth islandic capital | I | < or Eth islandic capital D1 | 84 | N tilde capital | | N tilde D2 | F1 | O grave capital | E | O grave D3 | EE | O acute capital | E | O accute D4 | EF | O circumflex capital | E | O circumflex D5 | CD | O tilde capital | S | O tilde D6 | 85 | O diaeresis capital | | O dieresis D7 | D7 | Multiply sign | A | lozenge (open diamond) D8 | AF | O slash capital | E | O slash D9 | F4 | U grave capital | E | U grave DA | F2 | U acute capital | E | U accute DB | F3 | U circumflex capital | E | U circumflex DC | 86 | U diaeresis capital | | U dieresis DD | A0 | Y acute Capital | I | dagger or Y acute Capital DE | DE | Thorn islandic capital| I | fi or Thorn islandic capital DF | A7 | sharp s small | | Es-sed (German double s) E0 | 88 | a grave small | | a grave E1 | 87 | a acute small | | a accute E2 | 89 | a circumflex small | | a circumflex E3 | 8B | a tilde small | S | a tilde E4 | 8A | a diaeresis small | | a dieresis E5 | 8C | a overcircle small | | a ring E6 | BE | ae diphthong small | | ae E7 | 8D | c cedilla small | | c cedilla E8 | 8F | e grave small | | e grave E9 | 8E | e acute small | | e accute EA | 90 | e circumflex small | | e circumflex EB | 91 | e diaeresis small | | e dieresis EC | 93 | i grave small | | i grave ED | 92 | i acute small | | i accute EE | 94 | i circumflex small | | i circumflex EF | 95 | i diaeresis small | | i dieresis F0 | DD | Eth Islandic small | I | > or Eth Islandic small F1 | 96 | n tilde small | | n tilde F2 | 98 | o grave small | | o grave F3 | 97 | o acute small | | o accute F4 | 99 | o circumflex small | | o circumflex F5 | 9B | o tilde small | S | o tilde F6 | 9A | o diaeresis small | | o dieresis F7 | D6 | Divide sign | | divide F8 | BF | o slash small | S | o slash F9 | 9D | u grave small | | u grave FA | 9C | u acute small | | u accute FB | 9E | u circumflex small | | u circumflex FC | 9F | u diaeresis small | | u dieresis FD | E0 | y acute small | I | double dagger of y acute small FE | DF | Thorn islandic small | I | fl or Thorn islandic small FF | D8 | y diaeresis small | | y dieresis data 'taBL' (1001, "Translate In", purgeable) { /* Translation from ISO 8859-1 to Macintosh extended code */ /* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF */ /*0x*/ $"00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F" /*1x*/ $"10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F" /*2x*/ $"20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F" /*3x*/ $"30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F" /*4x*/ $"40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F" /*5x*/ $"50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F" /*6x*/ $"60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F" /*7x*/ $"70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F" /*8x*/ $"A5 AA AD B0 B3 B7 BA BD C3 C5 C9 D1 D4 D9 DA B6" /*9x*/ $"C6 CE E2 E3 E4 F0 F6 F7 F9 FA FB FD FE FF F5 C4" /*Ax*/ $"CA C1 A2 A3 DB B4 CF A4 AC A9 BB C7 C2 D0 A8 F8" /*Bx*/ $"A1 B1 D3 D2 AB B5 A6 E1 FC D5 BC C8 B9 B8 B2 C0" /*Cx*/ $"CB E7 E5 CC 80 81 AE 82 E9 83 E6 E8 ED EA EB EC" /*Dx*/ $"DC 84 F1 EE EF CD 85 D7 AF F4 F2 F3 86 A0 DE A7" /*Ex*/ $"88 87 89 8B 8A 8C BE 8D 8F 8E 90 91 93 92 94 95" /*Fx*/ $"DD 96 98 97 99 9B 9A D6 BF 9D 9C 9E 9F E0 DF D8" }; data 'taBL' (1002, "Translate Out", purgeable) { /* Translation from Macintosh extended code to ISO 8859-1 */ /* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF */ /*0x*/ $"00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F" /*1x*/ $"10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F" /*2x*/ $"20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F" /*3x*/ $"30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F" /*4x*/ $"40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F" /*5x*/ $"50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F" /*6x*/ $"60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F" /*7x*/ $"70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F" /*8x*/ $"C4 C5 C7 C9 D1 D6 DC E1 E0 E2 E4 E3 E5 E7 E9 E8" /*9x*/ $"EA EB ED EC EE EF F1 F3 F2 F4 F6 F5 FA F9 FB FC" /*Ax*/ $"DD B0 A2 A3 A7 80 B6 DF AE A9 81 B4 A8 82 C6 D8" /*Bx*/ $"83 B1 BE 84 A5 B5 8F 85 BD BC 86 AA BA 87 E6 F8" /*Cx*/ $"BF A1 AC 88 9F 89 90 AB BB 8A A0 C0 C3 D5 91 A6" /*Dx*/ $"AD 8B B3 B2 8C B9 F7 D7 FF 8D 8E A4 D0 F0 DE FE" /*Ex*/ $"FD B7 92 93 94 C2 CA C1 CB C8 CD CE CF CC D3 D4" /*Fx*/ $"95 D2 DA DB D9 9E 96 97 AF 98 99 9A B8 9B 9C 9D" }; ISO 8859-1 Here is a names list and graphic representation of the ISO 8859-1 code. The well-known ASCII part and control characters have been left out to shorten the text. They are included for practical programming help only. In particular, the "bitmaps" are nothing official. For convenience, two lists of names and acronyms are given: the first comes from IBM, the second from a list of characters of the standard IS0 6937. Code point in hexadecimal / Acronym / Name. Origin: IBM. A0 | SP30 | required space D0 | LD62 | Eth islandic capital A1 | SP03 | exclamation point inv D1 | LN20 | N tilde capital A2 | SC04 | cent sign D2 | LO14 | O grave capital A3 | SC02 | pound sign D3 | LO12 | O acute capital A4 | SC01 | int. currency symbol D4 | LO16 | O circumflex capital A5 | SC05 | Yen sign D5 | LO20 | O tilde capital A6 | SM65 | Vertical Line, Broken D6 | LO18 | O diaeresis capital A7 | SM24 | section/paragraph symb D7 | SA07 | Multiply sign A8 | SD17 | diaeresis,umlaut acc D8 | LO62 | O slash capital A9 | SM52 | Copyright sign D9 | LU14 | U grave capital AA | SM21 | ordinal indicator fem DA | LU12 | U acute capital AB | SP17 | left angle quotes DB | LU16 | U circumflex capital AC | SM66 | logical NOT, EOL symb DC | LU18 | U diaeresis capital AD | SP32 | Syllabe Hyphen DD | LY12 | Y acute Capital AE | SM53 | Regist.Trade Mark sym DE | LT64 | Thorn islandic capital AF | SM15 | overline DF | LS61 | sharp s small B0 | SM19 | Degree Symbol E0 | LA13 | a grave small B1 | SA02 | plus or minus sign E1 | LA11 | a acute small B2 | ND021| 2 superscript E2 | LA15 | a circumflex small B3 | ND031| 3 superscript E3 | LA19 | a tilde small B4 | SD11 | acute accent E4 | LA17 | a diaeresis small B5 | SM17 | micro symbol E5 | LA27 | a overcircle small B6 | SM25 | paragraph symbol USA E6 | LA51 | ae diphthong small B7 | SD63 | Middle dot accent E7 | LC41 | c cedilla small B8 | SD41 | cedilla accent E8 | LE13 | e grave small B9 | ND011| 1 superscript E9 | LE11 | e acute small BA | SM20 | ordinal indicator mas EA | LE15 | e circumflex small BB | SP18 | right angle quotes EB | LE17 | e diaeresis small BC | NF04 | one quarter EC | LI13 | i grave small BD | NF01 | one half ED | LI11 | i acute small BE | NF05 | three quarters EE | LI15 | i circumflex small BF | SP16 | Question mark inverted EF | LI17 | i diaeresis small C0 | LA14 | A grave capital F0 | LD63 | Eth Islandic small C1 | LA12 | A acute capital F1 | LN19 | n tilde small C2 | LA16 | A circumflex capital F2 | LO13 | o grave small C3 | LA20 | A tilde capital F3 | LO11 | o acute small C4 | LA18 | A diaeresis capital F4 | LO15 | o circumflex small C5 | LA28 | A overcircle capital F5 | LO19 | o tilde small C6 | LA52 | AE diphthong capital F6 | LO17 | o diaeresis small C7 | LC42 | C cedilla capital F7 | SA06 | Divide sign C8 | LE14 | E grave capital F8 | LO61 | o slash small C9 | LE12 | E acute capital F9 | LU13 | u grave small CA | LE16 | E circumflex capital FA | LU11 | u acute small CB | LE18 | E diaeresis capital FB | LU15 | u circumflex small CC | LI14 | I grave capital FC | LU17 | u diaeresis small CD | LI12 | I acute capital FD | LY11 | y acute small CE | LI16 | I circumflex capital FE | LT63 | Thorn islandic small CF | LI18 | I diaeresis capital FF | LY17 | y diaeresis small Names and slightly different acronyms from the ISO 6937 repertoire A0 SP31 NO-BREAK SPACE A1 SP03 INVERTED EXCLAMATION MARK A2 SC04 CENT SIGN A3 SC02 POUND SIGN A4 SC01 CURRENCY SIGN A5 SC05 YEN SIGN A6 SM65 BROKEN BAR A7 SM24 PARAGRAPH SIGN A8 SD17 DIAERESIS A9 SM52 COPYRIGHT SIGN AA SM21 FEMININE ORDINAL INDICATOR AB SP17 LEFT POINTING DOUBLE ANGLE QUOTATION MARK AC SM66 NOT SIGN AD SP32 SOFT HYPHEN AE SM53 REGISTERED TRADE MARK SIGN AF SD31 MACRON B0 SM19 DEGREE SIGN B1 SA02 PLUS-MINUS SIGN B2 NS02 SUPERSCRIPT TWO B3 NS03 SUPERSCRIPT THREE B4 SD11 ACUTE ACCENT B5 SM17 MICRO SIGN B6 SM25 PILCHROW SIGN B7 SM26 MIDDLE DOT B8 SD41 CEDILLA B9 NS01 SUPERSCRIPT ONE BA SM20 MASCULINE ORDINAL INDICATOR BB SP18 RIGHT POINTING DOUBLE ANGLE QUOTATION MARK BC NF04 VULGAR FRACTION ONE-QUARTER BD NF01 VULGAR FRACTION ONE-HALF BE NF05 VULGAR FRACTION THREE-QUARTERS BF SP16 INVERTED QUESTION MARK C0 LA14 LATIN CAPITAL LETTER A WITH GRAVE ACCENT C1 LA12 LATIN CAPITAL LETTER A WITH ACUTE ACCENT C2 LA16 LATIN CAPITAL LETTER A WITH CIRCUMFLEX ACCENT C3 LA20 LATIN CAPITAL LETTER A WITH TILDE C4 LA18 LATIN CAPITAL LETTER A WITH DIAERESIS C5 LA28 LATIN CAPITAL LETTER A WITH RING ABOVE C6 LA52 LATIN CAPITAL LIGATURE AE C7 LC42 LATIN CAPITAL LETTER C WITH CEDILLA C8 LE14 LATIN CAPITAL LETTER E WITH GRAVE ACCENT C9 LE12 LATIN CAPITAL LETTER E WITH ACUTE ACCENT CA LE16 LATIN CAPITAL LETTER E WITH CIRCUMFLEX ACCENT CB LE18 LATIN CAPITAL LETTER E WITH DIAERESIS CC LI14 LATIN CAPITAL LETTER I WITH GRAVE ACCENT CD LI12 LATIN CAPITAL LETTER I WITH ACUTE ACCENT CE LI16 LATIN CAPITAL LETTER I WITH CIRCUMFLEX ACCENT CF LI18 LATIN CAPITAL LETTER I WITH DIAERESIS D0 LD62 LATIN CAPITAL LETTER D WITH STROKE D1 LN20 LATIN CAPITAL LETTER N WITH TILDE D2 LO14 LATIN CAPITAL LETTER O WITH GRAVE ACCENT D3 LO12 LATIN CAPITAL LETTER O WITH ACUTE ACCENT D4 LO16 LATIN CAPITAL LETTER O WITH CIRCUMFLEX ACCENT D5 LO20 LATIN CAPITAL LETTER O WITH TILDE D6 LO18 LATIN CAPITAL LETTER O WITH DIAERESIS D7 SA07 MULTIPLICATION SIGN D8 LO62 LATIN CAPITAL LETTER O WITH OBLIQUE STROKE D9 LU14 LATIN CAPITAL LETTER U WITH GRAVE ACCENT DA LU12 LATIN CAPITAL LETTER U WITH ACUTE ACCENT DB LU16 LATIN CAPITAL LETTER U WITH CIRCUMFLEX ACCENT DC LU18 LATIN CAPITAL LETTER U WITH DIAERESIS DD LY12 LATIN CAPITAL LETTER Y WITH ACUTE ACCENT DE LT64 LATIN CAPITAL LETTER ICELANDIC THORN DF LS61 LATIN SMALL LETTER GERMAN SHARP S E0 LA13 LATIN SMALL LETTER A WITH GRAVE ACCENT E1 LA11 LATIN SMALL LETTER A WITH ACUTE ACCENT E2 LA15 LATIN SMALL LETTER A WITH CIRCUMFLEX ACCENT E3 LA19 LATIN SMALL LETTER A WITH TILDE E4 LA17 LATIN SMALL LETTER A WITH DIAERESIS E5 LA27 LATIN SMALL LETTER A WITH RING ABOVE E6 LA51 LATIN SMALL LIGATURE AE E7 LC41 LATIN SMALL LETTER C WITH CEDILLA E8 LE13 LATIN SMALL LETTER E WITH GRAVE ACCENT E9 LE11 LATIN SMALL LETTER E WITH ACUTE ACCENT EA LE15 LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT EB LE17 LATIN SMALL LETTER E WITH DIAERESIS EC LI13 LATIN SMALL LETTER I WITH GRAVE ACCENT ED LI11 LATIN SMALL LETTER I WITH ACUTE ACCENT EE LI15 LATIN SMALL LETTER I WITH CIRCUMFLEX ACCENT EF LI17 LATIN SMALL LETTER I WITH DIAERESIS F0 LD63 LATIN SMALL LETTER ICELANDIC ETH F1 LN19 LATIN SMALL LETTER N WITH TILDE F2 LO13 LATIN SMALL LETTER O WITH GRAVE ACCENT F3 LO11 LATIN SMALL LETTER O WITH ACUTE ACCENT F4 LO15 LATIN SMALL LETTER O WITH CIRCUMFLEX ACCENT F5 LO19 LATIN SMALL LETTER O WITH TILDE F6 LO17 LATIN SMALL LETTER O WITH DIAERESIS F7 SA06 DIVISION SIGN F8 LO61 LATIN SMALL LETTER O WITH OBLIQUE STROKE F9 LU13 LATIN SMALL LETTER U WITH GRAVE ACCENT FA LU11 LATIN SMALL LETTER U WITH ACUTE ACCENT FB LU15 LATIN SMALL LETTER U WITH CIRCUMFLEX ACCENT FC LU17 LATIN SMALL LETTER U WITH DIAERESIS FD LY11 LATIN SMALL LETTER Y WITH ACUTE ACCENT FE LT63 LATIN SMALL LETTER ICELANDIC THORN FF LY17 LATIN SMALL LETTER Y WITH DIAERESIS ISO 8859-1 by [coarse, bandwith saving] pictures ------------------------------------------------------------------------- | A0 | A1 | A2 | A3 | A4 | A5 | A6 | A7 | |--------|--------|--------|--------|--------|--------|--------|--------| | | XX | XX | XXX | | XX XX | XX | XXXXX | | | | XX | XX XX |XX XX | XX XX | XX | XX X| | | XX | XXXXXX | XX X | XXXXX | XXXX | XX | XXXX | | | XX |XX |XXXX |XX XX | XXXXXX | | XX XX | | | XXXX |XX | XX |XX XX | XX | | XX XX | | | XXXX | XXXXXX | XX XX | XXXXX | XXXXXX | XX | XXXX | | | XX | XX |XXXXXX |XX XX | XX | XX |X XX | | | | XX | | | XX | XX | XXXXX | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | A8 | A9 | AA | AB | AC | AD | AE | AF | |--------|--------|--------|--------|--------|--------|--------|--------| | | XXXXXX | XXXX | | | | XXXXXX |XXXXXXXX| |XX XX |X X| XX XX | XX XX| | |X X| | | |X XXX X| XX XX | XX XX | | |X XXX X| | | |X X X| XXXXX |XX XX |XXXXXXX | XXXXXX |X X X X| | | |X X X| | XX XX | XX | |X XXX X| | | |X XXX X| XXXXXX | XX XX| XX | |X X X X| | | |X X| | | | |X X| | | | XXXXXX | | | | | XXXXXX | | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | B0 | B1 | B2 | B3 | B4 | B5 | B6 | B7 | |--------|--------|--------|--------|--------|--------|--------|--------| | XXX | XX | XXXX | XXXX | XX | | XXXXXXX| | | XX XX | XX | XX | XX | XX | |XX XX XX| | | XX XX | XXXXXX | XX | XXX | XX | XX XX |XX XX XX| | | XXX | XX | XX | XX | | XX XX | XXXX XX| XX | | | XX | XXXXX | XXXX | | XX XX | XX XX| | | | | | | | XX XX | XX XX| | | | XXXXXX | | | | XXXXX | XX XX| | | | | | | |XX | | | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | B8 | B9 | BA | BB | BC | BD | BE | BF | |--------|--------|--------|--------|--------|--------|--------|--------| | | XX | XXX | | XX XX| XX XX|XXX X| XX | | | XXX | XX XX |XX XX |XXX XX |XXX XX | XX X | | | | XX | XX XX | XX XX | XX XX | XX XX |XXX X | XX | | | XX | XXX | XX XX| XXXX X | XXXXXX | XXX X | XX | | | XXXX | | XX XX | XX XX | XX XX|XXXX XX | XX | | XX | | XXXXX |XX XX | XX X X | XX XX | X X X | XX XX| | XX | | | |XX XXXXX|XX XX | X XXXXX| XXXXX | | XXX | | | | XX | XXXX|X XX | | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | C0 | C1 | C2 | C3 | C4 | C5 | C6 | C7 | |--------|--------|--------|--------|--------|--------|--------|--------| | XX | XX | XXXXX | XXX XX |XX XX | XXX | XXXXX | XXXXX | | XX | XX |X X |XX XXX | XXX | XX XX | XX XX |XX XX | | XXX | XXX | XXX | XXX | XX XX | XXXXX |XX XX |XX | | XX XX | XX XX | XX XX | XX XX |XX XX |XX XX |XXXXXXX |XX | |XX XX |XX XX |XX XX |XX XX |XXXXXXX |XXXXXXX |XX XX |XX XX | |XXXXXXX |XXXXXXX |XXXXXXX |XXXXXXX |XX XX |XX XX |XX XX | XXXXX | |XX XX |XX XX |XX XX |XX XX |XX XX |XX XX |XX XXX | XX | | | | | | | | | XXXX | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | C8 | C9 | CA | CB | CC | CD | CE | CF | |--------|--------|--------|--------|--------|--------|--------|--------| | XX | XX | XXXXX |XX XX | XX | XX | XXXX | XX XX | | XX | XX |X X | | XX | XX | X X | | |XXXXXXX |XXXXXXX |XXXXXXX |XXXXXXX | XXXX | XXXX | XXXX | XXXX | |XX |XX |XX |XX | XX | XX | XX | XX | |XXXXXX |XXXXX |XXXXXX |XXXXXX | XX | XX | XX | XX | |XX |XX |XX |XX | XX | XX | XX | XX | |XXXXXXX |XXXXXXX |XXXXXXX |XXXXXXX | XXXX | XXXX | XXXX | XXXX | | | | | | | | | | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 | |--------|--------|--------|--------|--------|--------|--------|--------| |XXXXX | XXX XX | XX | XX | XXXXX | XXX XX |XX XX | | | XX XX |XX XXX | XX | XX |X X |XX XXX | XXX |XX XX | | XX XX | | XXX | XXX | XXX | XXX | XX XX | XX XX | |XXXX XX |XXX XX | XX XX | XX XX | XX XX | XX XX |XX XX | XXX | | XX XX |XXXX XX |XX XX |XX XX |XX XX |XX XX |XX XX | XX XX | | XX XX |XX XXXX | XX XX | XX XX | XX XX | XX XX | XX XX |XX XX | |XXXXX |XX XXX | XXX | XXX | XXX | XXX | XXX | | | | | | | | | | | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | D8 | D9 | DA | DB | DC | DD | DE | DF | |--------|--------|--------|--------|--------|--------|--------|--------| | XXX X | XX | XX | XXXXX |XX XX | XX |XXXX | XXXX | | XX XX | XX | XX |X X | | XX | XX |XX XX | |XX XXX |XX XX |XX XX | |XX XX | XX XX | XXXXX |XX XX | |XX X XX |XX XX |XX XX |XX XX |XX XX | XX XX | XX XX |XX XX | |XXX XX |XX XX |XX XX |XX XX |XX XX | XXXX | XXXXX |XX XX | | XX XX |XX XX |XX XX |XX XX |XX XX | XX | XX |XX XX | |X XXX | XXXXX | XXXXX | XXXXX | XXXXX | XXXX |XXXX |XX XX | | | | | | | | | | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | E0 | E1 | E2 | E3 | E4 | E5 | E6 | E7 | |--------|--------|--------|--------|--------|--------|--------|--------| | XX | XX | XXXXX | XXX XX |XX XX | XX | | | | XX | XX |X X |XX XXX | | XX | | | | XXXX | XXXX | XXXX | XXXXX | XXXX | XXXX | XXXXXX | XXXXXX | | XX | XX | XX | XX | XX | XX | X X |XX | | XXXXX | XXXXX | XXXXX | XXXXXX | XXXXX | XXXXX |XXXXXXX |XX | |XX XX |XX XX |XX XX |XX XX |XX XX |XX XX |X X | XXXXXX | | XXX XX | XXX XX | XXX XX | XXXXXX | XXX XX | XXX XX |XXXXXXX | XX | | | | | | | | | XXX | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | E8 | E9 | EA | EB | EC | ED | EE | EF | |--------|--------|--------|--------|--------|--------|--------|--------| | XX | XX | XXXXX |XX XX | XX | XX | XXXXX | XX XX | | XX | XX |X X | | XX | XX |X X | | | XXXXX | XXXXX | XXXXX | XXXXX | | | XXX | XXX | |XX XX |XX XX |XX XX |XX XX | XXX | XXX | XX | XX | |XXXXXXX |XXXXXXX |XXXXXXX |XXXXXXX | XX | XX | XX | XX | |XX |XX |XX |XX | XX | XX | XX | XX | | XXXXX | XXXXX | XXXXX | XXXXX | XXXX | XXXX | XXXX | XXXX | | | | | | | | | | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | F0 | F1 | F2 | F3 | F4 | F5 | F6 | F7 | |--------|--------|--------|--------|--------|--------|--------|--------| | XX | XXX XX | XX | XX | XXXXX | XXX XX |XX XX | | | XXXXXX |XX XXX | XX | XX |X X |XX XXX | | XX | | XX | | XXXXX | XXXXX | XXXXX | XXXXX | XXXXX | | | XXXXX |XX XXX |XX XX |XX XX |XX XX |XX XX |XX XX | XXXXXX | |XX XX | XX XX |XX XX |XX XX |XX XX |XX XX |XX XX | | |XX XX | XX XX |XX XX |XX XX |XX XX |XX XX |XX XX | XX | | XXXX | XX XX | XXXXX | XXXXX | XXXXX | XXXXX | XXXXX | | | | | | | | | | | ------------------------------------------------------------------------- ------------------------------------------------------------------------- | F8 | F9 | FA | FB | FC | FD | FE | FF | |--------|--------|--------|--------|--------|--------|--------|--------| | | XX | XX | XXXX |XX XX | XX |XXX |XX XX | | X | XX | XX |X X | | XX | XX | | | XXXXX |XX XX |XX XX | |XX XX |XX XX | XXXXX |XX XX | |XX XXX |XX XX |XX XX |XX XX |XX XX |XX XX | XX XX |XX XX | |XX X XX |XX XX |XX XX |XX XX |XX XX |XX XX | XX XX |XX XX | |XXX XX |XX XX |XX XX |XX XX |XX XX | XXXXXX | XXXXX | XXXXXX | | XXXXX | XXX XX | XXX XX | XXX XX | XXX XX | XX | XX | XX | |X | | | | |XXXXXX |XXXX |XXXXXX | ------------------------------------------------------------------------- Andr'e PIRARD SEGI Univ. de Li`ege B26 - Sart Tilman B-4000 Li`ege 1 (Belgium) PIRARD@BLIULG11 on EARN alias BITNET pirard@vm1.ulg.ac.be on Internet